Takehome-Ex 3: Exploratory Spatial Data Analysis for Potential Push & Pull Factors of Locations in Singapore using Public Bus Data

Author

Victoria Grace ANN

Published

March 11, 2024

Modified

March 26, 2024

Overview

In this exercise, my focus would be on prototyping exploratory spatial data analysis, particularly density of spatial assets within hexagonal traffic analysis zones. In conventional geography, a traffic analysis zone is the unit most commonly used in transportation planning models and the size of it varies. Hexagonal traffic analysis zones has gained traction as the hexagons of the study area have a uniform size which are easily comparable with each other when determining transport attractiveness. It is also recommended that hexagon radius should be 125m for areas in high urbanisation and 250m in areas with less urbanisation (Chmielewski et al., 2020).

The data preparation for this purpose also supports the preparation for calibrating spatial interaction models. For the prototyping, only exploratory spatial data analysis will be done.

Package Installation

I will load up the below R packages for the following purposes:

  • tmap: to visualise spatial data by creating thematic maps that could be interactive or static.

  • sf: to manipulate spatial data.

  • sp: to manipulate spatial data (older than sp).

  • reshape2: to reshape and transform data frames; converting data between “wide” and “long” formats using functions like melt().

  • ggplot2: to visualise data with plot types including bar plots.

  • tidyverse: to clean and transform data and contains sub-packagess including tidyr, dplyr and readr.

  • stplanr: to provide functions suitable for working with spatial transport data like networks, orgins-destinations matrices and travel time matrices; builds on the capabilities of sf and sp packages.

  • knitr: to combine code into a single documents that can be easily converted to various output formats like html, pdf, or Word.

  • kableExtra: to further style any Kables.
pacman::p_load(tmap, sf, sp, reshape2, ggplot2, ggpubr, tidyverse, stplanr, knitr, kableExtra)

Data Preparation

  • URA’s Masterplan Subzone 2019 Layer in shapefile format

  • Bus Stop Locations extracted from LTA

  • A tabulated bus passenger flow for Nov 2023, Dec 2023 and Jan 2024 from LTA dynamic data mall

  • Population Data for 2023 from SingStat

  • Schools from MOE

  • Financial Services

  • Hospitals, polyclinics and CHAS clinics derived from Google Maps

Subzone Layer

Read the subzone layer

mpsz <- st_read("data/geospatial/master-plan-2019-subzone-boundary-no-sea-kml.kml")

Convert mpsz to 2D geometry.

mpsz <- st_zm(mpsz, drop = TRUE) # Convert 3D geometries to 2D

Extract the SUBZONE_N and PLN_AREA_N from the Dscrptn field

For SUBZONE_N,

mpsz <- mpsz %>%
  rowwise() %>%
  mutate(SUBZONE_N = str_extract(Description, "<th>SUBZONE_N</th> <td>(.*?)</td>")) %>%
  ungroup()

mpsz$SUBZONE_N <- str_remove_all(mpsz$SUBZONE_N, "<.*?>|SUBZONE_N")

For PLN_AREA_N,

mpsz <- mpsz %>%
  rowwise() %>%
  mutate(PLN_AREA_N = str_extract(Description, "<th>PLN_AREA_N</th> <td>(.*?)</td>")) %>%
  ungroup()

mpsz$PLN_AREA_N <- str_remove_all(mpsz$PLN_AREA_N, "<.*?>|PLN_AREA_N")

Remove the Description column

mpsz$Description <- NULL

Create the shapefile

st_write(mpsz, "data/geospatial/mpsz_sf.shp")

Read the updated shapefile.

mpsz_sf <- st_read("data/geospatial/mpsz_sf.shp")
Reading layer `mpsz_sf' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\mpsz_sf.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Creating Spatial Grids

mpsz3414 <- st_transform(mpsz_sf, 3414)
outer_islands <- c("SEMAKAU", "SUDONG", "NORTH-EASTERN ISLANDS", "SOUTHERN GROUP")
mpsz3414 <- mpsz3414 %>%
  filter(!str_trim(SUBZONE_N) %in% str_trim(outer_islands))
hex_grid <- st_make_grid(mpsz3414, cellsize = c(750, 750), what = "polygons", square = FALSE) %>%
  st_sf() %>%
  # Apply as.factor() since index will be used as the identifier to link to other data sets
  mutate(index = as.factor(row_number()))

# Create border of Singapore's land area
mpsz_border <- mpsz3414 %>% st_union()

# Clip the hexagon grid layer
hex_grid_bounded <- st_intersection(hex_grid, mpsz_border)
# Check if hex grid intersects any polygons using st_intersects
# Returns a list of intersecting hexagons
intersection_list = hex_grid$index[lengths(st_intersects(hex_grid, hex_grid_bounded)) > 0]

# Filter for the intersecting hexagons
hex_grid_bounded2 = hex_grid %>%
  filter(index %in% intersection_list)

tm_shape(hex_grid_bounded2) +
  tm_polygons(alpha = 0.2) +
  tm_basemap("OpenStreetMap")

  • The map above now shows the complete analytical hexagon data of 375m (perpendicular distance between the centre of hexagon and its edges) that represents the TAZ.

Map over SUBZONE_N and PLN_AREA_N to hex_grid_bounded2.

joined <- st_join(hex_grid_bounded2, mpsz3414, join = st_intersects, left = FALSE)

aggregated <- joined %>%
  group_by(index) %>%
  summarise(SUBZONE_N = first(SUBZONE_N))

hex_grid_bounded2$SUBZONE_N <- aggregated$SUBZONE_N

hex_grid_bounded2 <- hex_grid_bounded2 %>%
  mutate(index = as.factor(row_number()))

hex_grid_bounded2 <- hex_grid_bounded2[, c("SUBZONE_N", setdiff(names(hex_grid_bounded2), "SUBZONE_N"))]

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "SUBZONE_N",
          alpha = 0.7)

aggregated <- joined %>%
  group_by(index) %>%
  summarise(PLN_AREA_N = first(PLN_AREA_N))

hex_grid_bounded2$PLN_AREA_N <- aggregated$PLN_AREA_N

hex_grid_bounded2 <- hex_grid_bounded2 %>%
  mutate(index = as.factor(row_number()))

hex_grid_bounded2 <- hex_grid_bounded2[, c("PLN_AREA_N", setdiff(names(hex_grid_bounded2), "PLN_AREA_N"))]

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "PLN_AREA_N",
          alpha = 0.7)

tmap_mode("plot")

Bus Stop Locations

BusStop <- read.csv("data/aspatial/bus_coords_subzone_v2.csv") %>% st_as_sf(coords=c("Longitude", "Latitude"), crs=4326) %>%
  st_transform(crs=3414)

Compute Bus Stop Density

Using st_intersects(), we can intersect the bus stops layer and the hexagon layer and use lengths() to count the number of bus stops that lie inside each Traffic Analysis Zone. These count values are appended back to each spatial grid and encapsulated into a new column called busstop_count in a duplicated dataframe, hex_grid_bounded2.

hex_grid_bounded2$busstop_count <- lengths(st_intersects(hex_grid_bounded2, BusStop))
tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont", 
          title = "Bus Stop Density") + 
  tm_borders(col = "grey") +
  tm_legend(position = c("RIGHT", "BOTTOM"))

tail(hex_grid_bounded2,10)
Simple feature collection with 10 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 49542.54 ymin: 30108.73 xmax: 56667.54 ymax: 38768.98
Projected CRS: SVY21 / Singapore TM
     PLN_AREA_N      SUBZONE_N                       geometry index
1664 CHANGI BAY     CHANGI BAY POLYGON ((49917.54 34005.84...  1664
1665 CHANGI BAY     CHANGI BAY POLYGON ((49917.54 36603.92...  1665
1666     CHANGI CHANGI AIRPORT POLYGON ((49917.54 37902.96...  1666
1667 CHANGI BAY     CHANGI BAY POLYGON ((50292.54 33356.32...  1667
1668 CHANGI BAY     CHANGI BAY POLYGON ((50292.54 37253.44...  1668
1669 CHANGI BAY     CHANGI BAY POLYGON ((51042.54 37253.44...  1669
1670 CHANGI BAY     CHANGI BAY POLYGON ((51417.54 36603.92...  1670
1671 CHANGI BAY     CHANGI BAY POLYGON ((54417.54 30108.73...  1671
1672 CHANGI BAY     CHANGI BAY POLYGON ((55542.54 33356.32...  1672
1673 CHANGI BAY     CHANGI BAY POLYGON ((56292.54 33356.32...  1673
     busstop_count
1664             0
1665             1
1666             0
1667             0
1668             0
1669             0
1670             0
1671             0
1672             0
1673             0

Constructing O-D Matrix of commuter flow

od_bus_nov <- read_csv("data/OD Bus/merged_bus_nov_v2.csv")
od_bus_dec <- read_csv("data/OD Bus/merged_bus_dec_v2.csv")
od_bus_jan <- read_csv("data/OD Bus/merged_bus_jan_v2.csv")
OD <- rbind(od_bus_nov, od_bus_dec)
OD <- rbind(OD, od_bus_jan)

nrow(od_bus_nov) + nrow(od_bus_dec) + nrow(od_bus_jan) == nrow(OD) # evaluates to TRUE
str(OD)

ORIGIN_PT_CODE and DESTINATION_PT_CODE are listed as character variables. These variables should be transformed to factors so that R treats them as grouping variables.

cols_to_convert <- c("ORIGIN_PT_CODE", "DESTINATION_PT_CODE")

OD[cols_to_convert] <- lapply(OD[cols_to_convert], as.factor)

glimpse(OD)

ORIGIN_PT_CODEandDESTINATION_PT_CODE` are now factors.

write_rds(OD, "data/rds/odbus_combined.rds")
odbus_combined <- readRDS("data/rds/odbus_combined.rds")

Commuting Flow Data

od <- odbus_combined %>%
  group_by(ORIGIN_PT_CODE, DESTINATION_PT_CODE, DAY_TYPE ,TIME_PER_HOUR) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS)) %>%
  ungroup()

Geospatial Data Wrangling

Now we need to convert the od data from aspatial to geospatial data.

First, we populate the hexagon grid indexes of hex_grid_bounded2 sf data frame into BusStop sf data frame using st_intersection().

BusStop_hex <- st_intersection(BusStop, hex_grid_bounded2) %>%
  select(BusStopCode, index) %>%
  st_drop_geometry()

cols_to_convert <- c("BusStopCode")
BusStop_hex[cols_to_convert] <- lapply(BusStop_hex[cols_to_convert], as.factor)

glimpse(BusStop_hex)
Rows: 5,103
Columns: 2
$ BusStopCode <fct> 25059, 26379, 25751, 25761, 25711, 25719, 26369, 25741, 26…
$ index       <fct> 29, 31, 39, 39, 40, 40, 41, 49, 50, 50, 51, 52, 52, 59, 60…

Append the hexagon grid index from BusStop_hex data frame to the segregated od data frames for both ORIGIN_PT_CODE and DESTINATION_PT_CODE.

od_data <- left_join(od, BusStop_hex,
            by = c("ORIGIN_PT_CODE" = "BusStopCode")) %>%
  rename("ORIGIN_hex" = "index")

od_data <- left_join(od_data, BusStop_hex,
            by = c("DESTINATION_PT_CODE" = "BusStopCode")) %>%
  rename("DESTIN_hex" = "index") %>%
  drop_na() %>%
  group_by(ORIGIN_hex, DESTIN_hex, DAY_TYPE, TIME_PER_HOUR) %>%
  summarise(TOTAL_TRIPS = sum(TRIPS))

cols_to_convert <- c("ORIGIN_hex", "DESTIN_hex")

od_data[cols_to_convert] <- lapply(od_data[cols_to_convert], as.factor)
write_rds(od_data, "data/rds/od_data.rds")

Check for duplicates

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

DT::datatable(duplicate)

No duplicates found.

Visualisation of O-D Flows

Desire lines can be used to visualise the OD matrix, which are rays connecting a site to associated location points.

Remove intra-zonal flows

It is not meaningful to include trips that started and ended from the same point, so we will only include point pairs where the origin and destination of a trip are different.

od_plot <- od_data[od_data$ORIGIN_hex!=od_data$DESTIN_hex,]

Create desire lines

Use od2line() of stplanr package to create the desire lines.

flowLine <- od2line(flow = od_plot, 
                    zones = hex_grid_bounded2,
                    zone_code = "index")
write_rds(flowLine, "data/rds/flowLine.rds")
flowLine <- read_rds("data/rds/flowLine.rds")

Visualise desire lines

Use tmap() to visualise the resulting desire lines. For a clearer visualisation, only desire lines with at least 5000 trips are shown.

The traffic density by desire lines may vary across time. Thus, the visualisations will be more intuitive if we further segment the data by hour.

Weekday

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 6) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed", 
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 6am (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 12) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 12nn (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count", 
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 18) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 6pm (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 0) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 12am (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

  • It is logical that there is less bus traffic in the weehours since most of the population is either asleep or not working at this hour.

Weekend/Holiday

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY",TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 6) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 6am (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 12) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 12nn (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

tm_shape(hex_grid_bounded2) +
  tm_fill(col = "busstop_count", 
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
flowLine %>%  
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 18) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips") + 
  tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekend 6pm (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

We can also focus on individual planning areas to observe more commuting trends within the planning area.

tmap_mode("view")
hex_grid_bounded2 %>%
  filter(PLN_AREA_N == "JURONG EAST") %>% 
tm_shape() +
  tm_fill(col = "busstop_count",
          palette = "Greens",
          style = "cont",
          title = "Bus Stop Density",
          popup.vars = c("SUBZONE_N"),
          alpha = 0.2) + 
  tm_view(set.zoom.limits = c(11,14)) + 
  tm_borders(col = "grey") +
flowLine %>%  
  filter(TOTAL_TRIPS >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           style = "fixed",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           popup.vars = c("TOTAL_TRIPS"))
tmap_mode("plot")

Computing Distance Matrix

mpsz_sp <- as(mpsz_sf, "Spatial")
dist <- spDists(mpsz_sp, 
                longlat = FALSE)
head(dist, n=c(10,10))

Label column and row headers of distance matrix

Create a list sorted by planning subzone.

sz_names <- mpsz_sf$SUBZONE_N

Attach the subzones to row and column for distance matrix matching.

colnames(dist) <- paste0(sz_names)
rownames(dist) <- paste0(sz_names)

Pivoting distance value by SUBZONE_N

Pivot the distance matrix into a long table by using the row and column subzone names.

distPair <- melt(dist) %>%
  rename(dist = value)
head(distPair, 10)
                      Var1        Var2       dist
1              MARINA EAST MARINA EAST 0.00000000
2         INSTITUTION HILL MARINA EAST 0.03528227
3           ROBERTSON QUAY MARINA EAST 0.03539590
4  JURONG ISLAND AND BUKOM MARINA EAST 0.18199800
5             FORT CANNING MARINA EAST 0.02687355
6         MARINA EAST (MP) MARINA EAST 0.01286792
7                   SUDONG MARINA EAST 0.17287059
8                  SEMAKAU MARINA EAST 0.13479457
9           SOUTHERN GROUP MARINA EAST 0.06792530
10                 SENTOSA MARINA EAST 0.05759682
  • Notice that from the first row, the within zone distance is 0.

Updating intra-zonal distances

Append a constance value to replace the intra-zonal distance of 0.

distPair %>%
  filter(dist > 0) %>%
  summary()
                      Var1                             Var2       
 MARINA EAST            :   331   MARINA EAST            :   331  
 INSTITUTION HILL       :   331   INSTITUTION HILL       :   331  
 ROBERTSON QUAY         :   331   ROBERTSON QUAY         :   331  
 JURONG ISLAND AND BUKOM:   331   JURONG ISLAND AND BUKOM:   331  
 FORT CANNING           :   331   FORT CANNING           :   331  
 MARINA EAST (MP)       :   331   MARINA EAST (MP)       :   331  
 (Other)                :107906   (Other)                :107906  
      dist         
 Min.   :0.001561  
 1st Qu.:0.064419  
 Median :0.107153  
 Mean   :0.110151  
 3rd Qu.:0.147771  
 Max.   :0.448634  
                   

Spatial Push and Pull Factors

Push factors are reasons for pushing people or passengers away from their location. Pull factors are reasons for attracting passenger to a location.

  • Population density of an area has an impact on movement patterns as higher population density areas can act as a propulsive force. The HDB data will be used as a proxy for the population density, where a greater number of units will indicate a higher population density.

  • Employment opportunities density of an area can also contribute a location’s push or pull. An area with more businesses will attract more workers to the area and thus the registered business data will be used as a proxy.

  • School Density can determine the volume of passengers commuting to an area as an area with more schools would attract more human traffic whilst an area with fewer or no schools will not attract students especially during school hours.

  • Financial Services Density can contribute to the overall attractiveness of a destination of a destination for employment by offering convenience.

  • Public Healthcare Density can reflect the utility of polyclinics and public hospitals through the traffic network.

Population Density

Importing Aspatial HDB data

Use read_csv() from the readr package to import the prepared HDB csv data.

hdb <- read_csv("data/aspatial/hdb.csv")

glimpse(hdb)
Rows: 12,442
Columns: 37
$ ...1                  <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
$ blk_no                <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
$ street                <chr> "BEACH RD", "BEDOK STH AVE 1", "CANTONMENT RD", …
$ max_floor_lvl         <dbl> 16, 14, 2, 15, 4, 25, 12, 14, 12, 2, 15, 15, 13,…
$ year_completed        <dbl> 1970, 1975, 2010, 1982, 1975, 1982, 1975, 1977, …
$ residential           <chr> "Y", "Y", "N", "Y", "Y", "Y", "Y", "Y", "Y", "N"…
$ commercial            <chr> "Y", "N", "Y", "N", "Y", "N", "N", "N", "Y", "Y"…
$ market_hawker         <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
$ miscellaneous         <chr> "N", "Y", "N", "N", "N", "N", "Y", "Y", "N", "N"…
$ multistorey_carpark   <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
$ precinct_pavilion     <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
$ bldg_contract_town    <chr> "KWN", "BD", "CT", "BD", "PRC", "BM", "QT", "GL"…
$ total_dwelling_units  <dbl> 142, 206, 0, 102, 55, 96, 125, 247, 95, 0, 220, …
$ `1room_sold`          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ `2room_sold`          <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ `3room_sold`          <dbl> 138, 204, 0, 0, 54, 0, 118, 0, 62, 0, 216, 214, …
$ `4room_sold`          <dbl> 1, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `5room_sold`          <dbl> 2, 2, 0, 92, 1, 96, 7, 0, 33, 0, 4, 5, 0, 4, 0, …
$ exec_sold             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ multigen_sold         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ studio_apartment_sold <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ `1room_rental`        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 319, 0, 0, 0…
$ `2room_rental`        <dbl> 0, 0, 0, 0, 0, 0, 0, 247, 0, 0, 0, 0, 0, 0, 56, …
$ `3room_rental`        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1,…
$ other_room_rental     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0,…
$ lat                   <dbl> 1.295097, 1.320852, 1.275488, 1.327969, 1.388610…
$ lng                   <dbl> 103.8541, 103.9337, 103.8414, 103.9227, 103.9881…
$ building              <chr> "RAFFLES HOTEL", "NIL", "PINNACLE @ DUXTON", "PI…
$ addr                  <chr> "1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673", "…
$ postal                <chr> "189673", "460001", "080001", "461001", "500001"…
$ SUBZONE_NO            <dbl> 2, 6, 3, 3, 1, 9, 10, 5, 3, 5, 1, 5, 2, 2, 1, 7,…
$ SUBZONE_N             <chr> "CITY HALL", "BEDOK SOUTH", "CHINATOWN", "KEMBAN…
$ SUBZONE_C             <chr> "DTSZ02", "BDSZ06", "OTSZ03", "BDSZ03", "CHSZ01"…
$ PLN_AREA_N            <chr> "DOWNTOWN CORE", "BEDOK", "OUTRAM", "BEDOK", "CH…
$ PLN_AREA_C            <chr> "DT", "BD", "OT", "BD", "CH", "BM", "QT", "GL", …
$ REGION_N              <chr> "CENTRAL REGION", "EAST REGION", "CENTRAL REGION…
$ REGION_C              <chr> "CR", "ER", "CR", "ER", "ER", "CR", "CR", "CR", …

For the purpose of computing a proxy for population density, the residential units will be extracted using filter() from the dplyr package.

hdb_residential <- hdb %>%
  filter(residential == "Y")

head(hdb_residential, 10)
# A tibble: 10 × 37
    ...1 blk_no street       max_floor_lvl year_completed residential commercial
   <dbl> <chr>  <chr>                <dbl>          <dbl> <chr>       <chr>     
 1     0 1      BEACH RD                16           1970 Y           Y         
 2     1 1      BEDOK STH A…            14           1975 Y           N         
 3     3 1      CHAI CHEE RD            15           1982 Y           N         
 4     4 1      CHANGI VILL…             4           1975 Y           Y         
 5     5 1      DELTA AVE               25           1982 Y           N         
 6     6 1      DOVER RD                12           1975 Y           N         
 7     7 1      EUNOS CRES              14           1977 Y           N         
 8     8 1      EVERTON PK              12           1980 Y           Y         
 9    10 1      GHIM MOH RD             15           1975 Y           N         
10    11 1      HAIG RD                 15           1976 Y           N         
# ℹ 30 more variables: market_hawker <chr>, miscellaneous <chr>,
#   multistorey_carpark <chr>, precinct_pavilion <chr>,
#   bldg_contract_town <chr>, total_dwelling_units <dbl>, `1room_sold` <dbl>,
#   `2room_sold` <dbl>, `3room_sold` <dbl>, `4room_sold` <dbl>,
#   `5room_sold` <dbl>, exec_sold <dbl>, multigen_sold <dbl>,
#   studio_apartment_sold <dbl>, `1room_rental` <dbl>, `2room_rental` <dbl>,
#   `3room_rental` <dbl>, other_room_rental <dbl>, lat <dbl>, lng <dbl>, …

There are also some outliers like hotels that are classified as a residential unit. We can remove rows containing ‘hotel’ using grepl().

hotels <- hdb_residential %>%
  filter(grepl("HOTEL", building, ignore.case = TRUE))

kable(hotels)
…1 blk_no street max_floor_lvl year_completed residential commercial market_hawker miscellaneous multistorey_carpark precinct_pavilion bldg_contract_town total_dwelling_units 1room_sold 2room_sold 3room_sold 4room_sold 5room_sold exec_sold multigen_sold studio_apartment_sold 1room_rental 2room_rental 3room_rental other_room_rental lat lng building addr postal SUBZONE_NO SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N REGION_C
0 1 BEACH RD 16 1970 Y Y N N N N KWN 142 0 1 138 1 2 0 0 0 0 0 0 0 1.295097 103.8541 RAFFLES HOTEL 1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673 189673 2 CITY HALL DTSZ02 DOWNTOWN CORE DT CENTRAL REGION CR
4580 3 BEACH RD 16 1970 Y Y N N N N KWN 138 0 1 134 0 3 0 0 0 0 0 0 0 1.294801 103.8545 RAFFLES HOTEL SINGAPORE 3 BEACH ROAD RAFFLES HOTEL SINGAPORE SINGAPORE 189674 189674 2 CITY HALL DTSZ02 DOWNTOWN CORE DT CENTRAL REGION CR

The HDB Blk 1 Beach Road shares a similar address as Raffles Hotel’s 1 Beach Road, but they have different postal codes.

To verify other similar addresses, filter for addresses containing “BEACH RD”.

beach_rd <- hdb_residential %>%
  filter(grepl("BEACH RD", street, ignore.case = TRUE))

kable(beach_rd)
…1 blk_no street max_floor_lvl year_completed residential commercial market_hawker miscellaneous multistorey_carpark precinct_pavilion bldg_contract_town total_dwelling_units 1room_sold 2room_sold 3room_sold 4room_sold 5room_sold exec_sold multigen_sold studio_apartment_sold 1room_rental 2room_rental 3room_rental other_room_rental lat lng building addr postal SUBZONE_NO SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N REGION_C
0 1 BEACH RD 16 1970 Y Y N N N N KWN 142 0 1 138 1 2 0 0 0 0 0 0 0 1.295097 103.8541 RAFFLES HOTEL 1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673 189673 2 CITY HALL DTSZ02 DOWNTOWN CORE DT CENTRAL REGION CR
1660 15 BEACH RD 20 1974 Y Y N N N N KWN 76 0 0 0 76 0 0 0 0 0 0 0 0 1.295796 103.8555 NIL 15 BEACH ROAD NIL 1 BUGIS DTSZ01 DOWNTOWN CORE DT CENTRAL REGION CR
2079 17 BEACH RD 20 1974 Y Y N N N N KWN 76 0 0 0 76 0 0 0 0 0 0 0 0 1.303689 103.8636 GOLDEN BEACH VISTA 17 BEACH ROAD GOLDEN BEACH VISTA SINGAPORE 190017 190017 9 CRAWFORD KLSZ09 KALLANG KL CENTRAL REGION CR
2567 2 BEACH RD 16 1970 Y Y N N N N KWN 139 0 1 136 0 2 0 0 0 0 0 0 0 1.390462 103.9753 CHANGI BEACH CLUB 2 ANDOVER ROAD CHANGI BEACH CLUB SINGAPORE 509984 509984 1 CHANGI POINT CHSZ01 CHANGI CH EAST REGION ER
4580 3 BEACH RD 16 1970 Y Y N N N N KWN 138 0 1 134 0 3 0 0 0 0 0 0 0 1.294801 103.8545 RAFFLES HOTEL SINGAPORE 3 BEACH ROAD RAFFLES HOTEL SINGAPORE SINGAPORE 189674 189674 2 CITY HALL DTSZ02 DOWNTOWN CORE DT CENTRAL REGION CR
6028 4 BEACH RD 16 1968 Y N N Y N N KWN 336 0 0 0 0 0 0 0 0 336 0 0 0 1.304716 103.8652 NIL 4 BEACH ROAD SINGAPORE 190004 190004 9 CRAWFORD KLSZ09 KALLANG KL CENTRAL REGION CR
7743 5 BEACH RD 16 1968 Y N N Y N N KWN 336 0 0 0 0 0 0 0 0 336 0 0 0 1.298092 103.8569 BEACH ROAD CONSERVATION AREA 5 TAN QUEE LAN STREET BEACH ROAD CONSERVATION AREA SINGAPORE 188094 188094 1 BUGIS DTSZ01 DOWNTOWN CORE DT CENTRAL REGION CR
8956 6 BEACH RD 16 1968 Y Y N N N N KWN 198 0 45 1 28 0 0 0 0 57 67 0 0 1.303992 103.8644 BEACH ROAD GARDENS 6 BEACH ROAD BEACH ROAD GARDENS SINGAPORE 190006 190006 9 CRAWFORD KLSZ09 KALLANG KL CENTRAL REGION CR

2, 5 and 15 Beach Road do not have the correct postal codes following the 1900XX convention. Additionally, these addresses do not have the correct coordinates too.

With reference to URA’s official asset map of Singapore, OneMap, the data will be manually modified using mutate() and ifelse().

hdb_residential2 <- hdb_residential %>%
  mutate(postal = ifelse(blk_no == 1 & street == "BEACH RD", 190001, postal)) %>%
  mutate(lat = ifelse(blk_no == 1 & street == "BEACH RD", 1.3036714, lat)) %>%
  mutate(lng = ifelse(blk_no == 1 & street == "BEACH RD", 103.8644787, lng)) %>%
  mutate(postal = ifelse(blk_no == 2 & street == "BEACH RD", 190002, postal)) %>%
  mutate(lat = ifelse(blk_no == 2 & street == "BEACH RD", 1.3040331, lat)) %>%
  mutate(lng = ifelse(blk_no == 2 & street == "BEACH RD", 103.8649285, lng)) %>%
  mutate(postal = ifelse(blk_no == 3 & street == "BEACH RD", 190003, postal)) %>%
  mutate(lat = ifelse(blk_no == 3 & street == "BEACH RD", 1.3041872, lat)) %>%
  mutate(lng = ifelse(blk_no == 3 & street == "BEACH RD", 103.8651934, lng)) %>%
  mutate(postal = ifelse(blk_no == 5 & street == "BEACH RD", 190005, postal)) %>%
  mutate(lat = ifelse(blk_no == 5 & street == "BEACH RD", 1.3043463, lat)) %>%
  mutate(lng = ifelse(blk_no == 5 & street == "BEACH RD", 103.8648158, lng)) %>%
  mutate(postal = ifelse(blk_no == 15 & street == "BEACH RD", 190015, postal)) %>%
  mutate(lat = ifelse(blk_no == 15 & street == "BEACH RD", 1.3034254, lat)) %>%
  mutate(lng = ifelse(blk_no == 15 & street == "BEACH RD", 103.8631535, lng))

Check for any duplicates.

duplicate <- hdb_residential2 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

DT::datatable(duplicate)

Converting Aspatial data to Geospatial data

Longitude and latitude values are in decimal degrees and thus the data is in wgs84 geographic coordinate system.

To convert hdb_residential2 into sf, we use st_as_sf() and set the crs argument to 4326 first. The transformation to Singapore’s coordinate reference system 3414 will be done with st_transform().

We only need the postal code, total dwelling units and geometry attributes so we will use the select() function to extract these columns.

hdb_residential_sf <- st_as_sf(hdb_residential2, 
                   coords = c("lng", "lat"),
                   crs=4326) %>%
  st_transform(crs = 3414) %>%
  select(postal, total_dwelling_units, geometry)

str(hdb_residential_sf)
sf [10,181 × 3] (S3: sf/tbl_df/tbl/data.frame)
 $ postal              : chr [1:10181] "190001" "460001" "461001" "500001" ...
 $ total_dwelling_units: num [1:10181] 142 206 102 55 96 125 247 95 220 219 ...
 $ geometry            :sfc_POINT of length 10181; first list element:  'XY' num [1:2] 31468 31779
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "postal" "total_dwelling_units"

Let’s build the HDB residential population density map.

Performing in-polygon count

housing_count <- st_join(hex_grid_bounded2, hdb_residential_sf, 
                     join = st_intersects, left = TRUE) %>%
  st_drop_geometry() %>%
  group_by(index) %>%
  summarise(housing_count = sum(total_dwelling_units)) %>%
  ungroup() %>%
  mutate(housing_count = ifelse(is.na(housing_count), 0, housing_count))
hex_grid_bounded3 <- left_join(hex_grid_bounded2, housing_count,
                               by = c("index" = "index"))

summary(hex_grid_bounded3$housing_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0     652       0    7879 

  • Housing density is greatest in the northeast, particularly the newer town areas like Punggol.

Employment Opportunities Density

Import Geospatial data: Business

biz <- st_read("data/geospatial/Business.shp") %>% st_transform(crs=3414)
Reading layer `Business' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\Business.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM

Businesses are concentrated in the central and west regions.

Perform point-in-polygon count

hex_grid_bounded3$biz_count <- lengths(st_intersects(hex_grid_bounded3, biz))

summary(hex_grid_bounded3$biz_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   3.915   2.000 106.000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "biz_count",
          palette = "Blues",
          style = "cont", 
          title = "Business Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

tmap_mode("plot")

Schools Density

schools <- read_csv("data/aspatial/schoolsclean.csv")
schools <- schools %>%
  separate(latlong, into = c("latitude", "longitude"), sep = ",", convert = TRUE)

schools_sf <- st_as_sf(schools, coords = c("longitude","latitude"), crs = 4326) %>% 
  st_transform(crs=3414)
tm_shape(hex_grid_bounded3) + 
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
tm_shape(schools_sf) + 
  tm_dots() + 
  tm_layout(main.title = "Location of Schools",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Perform point-in-polygon count

hex_grid_bounded3$school_count <- lengths(st_intersects(hex_grid_bounded3, schools_sf))

summary(hex_grid_bounded3$school_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.1943  0.0000  4.0000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "school_count",
          palette = "Blues",
          style = "cont", 
          title = "School Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Financial Services Density

Importing Geospatial data: Financial Services

FinServ <- st_read(dsn = "data/geospatial", layer = "FinServ") %>%
  st_transform(crs = 3414)
Reading layer `FinServ' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3320 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4881.527 ymin: 25171.88 xmax: 46526.16 ymax: 49338.02
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) + 
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
tm_shape(FinServ) + 
  tm_dots() + 
  tm_layout(main.title = "Location of Financial Services",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Perform point-in-polygon count

hex_grid_bounded3$fin_count <- lengths(st_intersects(hex_grid_bounded3, FinServ))

summary(hex_grid_bounded3$fin_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   1.984   1.000 117.000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "fin_count",
          palette = "Blues",
          style = "cont", 
          title = "Financial Services Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Public Healthcare Density

public_hc <- read_csv("data/aspatial/HospitalsPolyclinics v_2024.csv")
public_hc.sf <- st_as_sf(public_hc[1:42,], wkt = "geometry", crs = 4326) %>% 
  st_transform(crs=3414)

public_hc2.sf <- st_as_sf(public_hc[43:1235,], wkt = "geometry", crs = 3414) # CHAS clinics encoded in EPSG 3414 

public_hc.sf <- rbind(public_hc.sf, public_hc2.sf)

#write_rds(public_hc.sf, "data/geospatial/public_hc.sf")

Perform point-in-polygon count

hex_grid_bounded3$hc_count <- lengths(st_intersects(hex_grid_bounded3, public_hc.sf))

summary(hex_grid_bounded3$hc_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.7376  0.0000 17.0000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
  
tm_shape(public_hc.sf) +
  tm_dots() + 
  tm_layout(main.title = "Location of Affordable Healthcare Services",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

tm_shape(hex_grid_bounded3) +
  tm_fill(col = "hc_count",
          palette = "Blues",
          style = "cont", 
          title = "Public Healthcare Services Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Leisure & Recreation Density

leisure_recre <- st_read("data/geospatial/Liesure&Recreation.shp") %>% st_transform(crs=3414)
Reading layer `Liesure&Recreation' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\Liesure&Recreation.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
  
tm_shape(leisure_recre) +
  tm_dots() + 
  tm_layout(main.title = "Location of Leisure & Recreation",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Perform point-in-polygon count

hex_grid_bounded3$leisure_recre_count <- lengths(st_intersects(hex_grid_bounded3, leisure_recre))

summary(hex_grid_bounded3$leisure_recre_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.7256  1.0000 27.0000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "leisure_recre_count",
          palette = "Blues",
          style = "cont", 
          title = "Leisure & Recreational Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Food & Beverage (F&B)

food_bev <- st_read("data/geospatial/F&B.shp") %>% st_transform(crs=3414)
Reading layer `F&B' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\F&B.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
  
tm_shape(food_bev) +
  tm_dots() + 
  tm_layout(main.title = "Location of Food & Beverage",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Perform point-in-polygon count

hex_grid_bounded3$food_bev_count <- lengths(st_intersects(hex_grid_bounded3, food_bev))

summary(hex_grid_bounded3$food_bev_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   1.147   0.000 113.000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "food_bev_count",
          palette = "Blues",
          style = "cont", 
          title = "Food & Beverage Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Retail

retail <- st_read("data/geospatial/Retails.shp") %>% st_transform(crs=3414)
Reading layer `Retails' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\Retails.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
  
tm_shape(retail) +
  tm_dots() + 
  tm_layout(main.title = "Location of Retail",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Perform point-in-polygon count

hex_grid_bounded3$retail_count <- lengths(st_intersects(hex_grid_bounded3, retail))

summary(hex_grid_bounded3$retail_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     0.0     0.0    22.5    10.0  1758.0 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "retail_count",
          palette = "Blues",
          style = "cont", 
          title = "Retail Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Entertainment

entertn <- st_read("data/geospatial/entertn.shp") %>% st_transform(crs=3414)
Reading layer `entertn' from data source 
  `C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\entertn.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "busstop_count",
          palette = "Blues",
          style = "cont",
          title = "Bus Stop Density") +
  tm_borders(col = "grey") +
  
tm_shape(entertn) +
  tm_dots() + 
  tm_layout(main.title = "Location of Entertainment",
            main.title.position = "center",
            main.title.size = 1,
            frame = TRUE)  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Perform point-in-polygon count

hex_grid_bounded3$entertn_count <- lengths(st_intersects(hex_grid_bounded3, entertn))

summary(hex_grid_bounded3$entertn_count)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.00000  0.00000  0.00000  0.06814  0.00000 11.00000 
tm_shape(hex_grid_bounded3) +
  tm_fill(col = "entertn_count",
          palette = "Blues",
          style = "cont", 
          title = "Entertainment Density") + 
  tm_borders(col = "grey")  +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)

Prepare Origin & Destination Variables

Origin

propulsive <- hex_grid_bounded3 %>%
  st_drop_geometry() %>%
  select(index, biz_count, school_count, fin_count, hc_count, busstop_count, housing_count, leisure_recre_count, retail_count, entertn_count, food_bev_count)
  

origin <- names(propulsive) %>%
  modify_at(-1, ~ paste0("o_", .))  # Add prefix to all except index

# Assign modified names back to the data frame
names(propulsive) <- origin

Destination

attractiveness <- hex_grid_bounded3 %>%
  st_drop_geometry() %>%
  select(index, biz_count, school_count, fin_count, hc_count, busstop_count, housing_count, leisure_recre_count, retail_count, entertn_count, food_bev_count)
  

destin <- names(propulsive) %>%
  modify_at(-1, ~ paste0("d_", .))  # Add prefix to all except index

# Assign modified names back to the data frame
names(attractiveness) <- destin

Compute Distance Matrix

A distance matrix shows the distance between pairs of locations. A location’s distance from itselfis shown in the main diagonal of a distance matrix table.

In view of time, we will use sp rather than sf.

hex_grid_sp <- as(hex_grid_bounded2, "Spatial")
dist <- spDists(hex_grid_sp, longlat = FALSE)

# resultant matrix
head(dist, n=c(10, 10))
          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
 [1,]    0.000 1299.038 2598.076 3897.114 5196.152  750.000  750.000 1984.313
 [2,] 1299.038    0.000 1299.038 2598.076 3897.114 1984.313  750.000  750.000
 [3,] 2598.076 1299.038    0.000 1299.038 2598.076 3269.174 1984.313  750.000
 [4,] 3897.114 2598.076 1299.038    0.000 1299.038 4562.072 3269.174 1984.313
 [5,] 5196.152 3897.114 2598.076 1299.038    0.000 5857.687 4562.072 3269.174
 [6,]  750.000 1984.313 3269.174 4562.072 5857.687    0.000 1299.038 2598.076
 [7,]  750.000  750.000 1984.313 3269.174 4562.072 1299.038    0.000 1299.038
 [8,] 1984.313  750.000  750.000 1984.313 3269.174 2598.076 1299.038    0.000
 [9,] 3269.174 1984.313  750.000  750.000 1984.313 3897.114 2598.076 1299.038
[10,] 4562.072 3269.174 1984.313  750.000  750.000 5196.152 3897.114 2598.076
          [,9]    [,10]
 [1,] 3269.174 4562.072
 [2,] 1984.313 3269.174
 [3,]  750.000 1984.313
 [4,]  750.000  750.000
 [5,] 1984.313  750.000
 [6,] 3897.114 5196.152
 [7,] 2598.076 3897.114
 [8,] 1299.038 2598.076
 [9,]    0.000 1299.038
[10,] 1299.038    0.000

The result is a matrix object class and the column and row headers are not labeled with the hexagon grid index representing the TAZ.

Label column and row headers

Create a list sorted according to the distance matrix by Traffic Analytical Zones.

hex_names <- hex_grid_bounded2$index

Attach the hexagon grid index to the rows and columns for distance matrix matching.

colnames(dist) <- paste0(hex_names)
rownames(dist) <- paste0(hex_names)

Pivot distance value by hexagon grid index

The distance matrix is pivoted into a long table by using the melt() function of reshape2.

distPair <- reshape2::melt(dist) %>%
  rename(dist = value)

head(distPair, 10)
   Var1 Var2     dist
1     1    1    0.000
2     2    1 1299.038
3     3    1 2598.076
4     4    1 3897.114
5     5    1 5196.152
6     6    1  750.000
7     7    1  750.000
8     8    1 1984.313
9     9    1 3269.174
10   10    1 4562.072
  • The intra-zonal distances are 0.

Update intra-zonal distances

Check for the lowest non-zero distance value.

distPair %>%
  filter(dist > 0) %>%
  summary()
      Var1           Var2           dist      
 Min.   :   1   Min.   :   1   Min.   :  750  
 1st Qu.: 419   1st Qu.: 419   1st Qu.: 9808  
 Median : 837   Median : 837   Median :15660  
 Mean   : 837   Mean   : 837   Mean   :16748  
 3rd Qu.:1255   3rd Qu.:1255   3rd Qu.:22362  
 Max.   :1673   Max.   :1673   Max.   :54750  

Since the lowest is 750m, any distance less than 750m can represent the intra-zonal distance. For consistency, 375m is appended to intra-zonal distances.

distPair$dist <- ifelse(distPair$dist == 0,
                        375, distPair$dist)

summary(distPair)
      Var1           Var2           dist      
 Min.   :   1   Min.   :   1   Min.   :  375  
 1st Qu.: 419   1st Qu.: 419   1st Qu.: 9808  
 Median : 837   Median : 837   Median :15660  
 Mean   : 837   Mean   : 837   Mean   :16738  
 3rd Qu.:1255   3rd Qu.:1255   3rd Qu.:22362  
 Max.   :1673   Max.   :1673   Max.   :54750  

Rename the origin and destination fields and convert into factor data type.

distPair <- distPair %>%
  rename(orig = Var1,
         dest = Var2) %>%
  mutate(across(c(orig, dest), as.factor))

Separating intra-flow from passenger volume

A new column FlowNoIntra is created to differentiate intra-zone trips from inter-zone trips based on the comparison of origin and destination zones.

od_data$FlowNoIntra <- ifelse(
  od_data$ORIGIN_hex == od_data$DESTIN_hex, 0, od_data$TOTAL_TRIPS)
od_data$offset <- ifelse(
  od_data$ORIGIN_hex == od_data$DESTIN_hex, 0.000001, 1)
od_data <- od_data%>%
  filter(FlowNoIntra > 0)

Combining passenger volume data with distance value

Convert the data value type into factor for the origin and destination hex indexes.

od_data$ORIGIN_hex <- as.factor(od_data$ORIGIN_hex)
od_data$DESTIN_hex <- as.factor(od_data$DESTIN_hex)

Then perform a left join to join the OD data and distPair. Store it under flow_data.

flow_data <- od_data %>%
  left_join (distPair,
             by = c("ORIGIN_hex" = "orig",
                    "DESTIN_hex" = "dest"))

Preparing origin attributes

flow_data <- flow_data %>%
  left_join (propulsive,
             by = c("ORIGIN_hex" = "index"))

Preparing destination attributes

flow_data <- flow_data %>%
  left_join (attractiveness,
             by = c("DESTIN_hex" = "index"))

Check for variables with zero values

For log transformation, log 0 is undefined so it is critical to ensure there are no zero values in the explanatory variables.

   ORIGIN_hex        DESTIN_hex        DAY_TYPE         TIME_PER_HOUR  
 1126   :   9528   1126   :   9225   Length:1809270     Min.   : 0.00  
 1349   :   9207   695    :   9052   Class :character   1st Qu.:10.00  
 1258   :   9110   1349   :   8999   Mode  :character   Median :14.00  
 1107   :   8728   1258   :   8927                      Mean   :13.88  
 695    :   8700   1342   :   8811                      3rd Qu.:18.00  
 1342   :   8693   1107   :   8667                      Max.   :23.00  
 (Other):1755304   (Other):1755589                                     
  TOTAL_TRIPS        FlowNoIntra           offset       dist      
 Min.   :     1.0   Min.   :     1.0   Min.   :1   Min.   :  750  
 1st Qu.:     3.0   1st Qu.:     3.0   1st Qu.:1   1st Qu.: 2704  
 Median :    13.0   Median :    13.0   Median :1   Median : 4684  
 Mean   :   143.9   Mean   :   143.9   Mean   :1   Mean   : 5742  
 3rd Qu.:    58.0   3rd Qu.:    58.0   3rd Qu.:1   3rd Qu.: 7937  
 Max.   :134829.0   Max.   :134829.0   Max.   :1   Max.   :24784  
                                                                  
  o_biz_count    o_school_count    o_fin_count       o_hc_count    
 Min.   : 0.00   Min.   :0.0000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 0.00   1st Qu.:0.0000   1st Qu.: 1.000   1st Qu.: 0.000  
 Median : 1.00   Median :0.0000   Median : 3.000   Median : 1.000  
 Mean   : 5.05   Mean   :0.5816   Mean   : 5.663   Mean   : 2.342  
 3rd Qu.: 5.00   3rd Qu.:1.0000   3rd Qu.: 8.000   3rd Qu.: 4.000  
 Max.   :99.00   Max.   :4.0000   Max.   :44.000   Max.   :17.000  
                                                                   
 o_busstop_count  o_housing_count o_leisure_recre_count o_retail_count  
 Min.   : 1.000   Min.   :   0    Min.   : 0.000        Min.   :  0.00  
 1st Qu.: 6.000   1st Qu.:   0    1st Qu.: 0.000        1st Qu.:  7.00  
 Median : 8.000   Median :1321    Median : 1.000        Median : 26.00  
 Mean   : 7.831   Mean   :2019    Mean   : 1.536        Mean   : 58.96  
 3rd Qu.:10.000   3rd Qu.:3761    3rd Qu.: 2.000        3rd Qu.: 75.00  
 Max.   :20.000   Max.   :7879    Max.   :23.000        Max.   :647.00  
                                                                        
 o_entertn_count o_food_bev_count d_o_biz_count    d_o_school_count
 Min.   :0.000   Min.   :  0.00   Min.   : 0.000   Min.   :0.000   
 1st Qu.:0.000   1st Qu.:  0.00   1st Qu.: 0.000   1st Qu.:0.000   
 Median :0.000   Median :  0.00   Median : 1.000   Median :0.000   
 Mean   :0.139   Mean   :  1.81   Mean   : 5.055   Mean   :0.581   
 3rd Qu.:0.000   3rd Qu.:  1.00   3rd Qu.: 5.000   3rd Qu.:1.000   
 Max.   :5.000   Max.   :110.00   Max.   :99.000   Max.   :4.000   
                                                                   
 d_o_fin_count     d_o_hc_count    d_o_busstop_count d_o_housing_count
 Min.   : 0.000   Min.   : 0.000   Min.   : 1.000    Min.   :   0     
 1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 6.000    1st Qu.:   0     
 Median : 3.000   Median : 1.000   Median : 8.000    Median :1321     
 Mean   : 5.628   Mean   : 2.344   Mean   : 7.831    Mean   :2027     
 3rd Qu.: 8.000   3rd Qu.: 4.000   3rd Qu.:10.000    3rd Qu.:3793     
 Max.   :44.000   Max.   :17.000   Max.   :20.000    Max.   :7879     
                                                                      
 d_o_leisure_recre_count d_o_retail_count d_o_entertn_count d_o_food_bev_count
 Min.   : 0.000          Min.   :  0.00   Min.   :0.0000    Min.   :  0.000   
 1st Qu.: 0.000          1st Qu.:  7.00   1st Qu.:0.0000    1st Qu.:  0.000   
 Median : 1.000          Median : 26.00   Median :0.0000    Median :  0.000   
 Mean   : 1.524          Mean   : 58.78   Mean   :0.1352    Mean   :  1.815   
 3rd Qu.: 2.000          3rd Qu.: 76.00   3rd Qu.:0.0000    3rd Qu.:  1.000   
 Max.   :23.000          Max.   :647.00   Max.   :5.0000    Max.   :110.000   
                                                                              

All except the bus stop count variables contain zero values and need to be replaced with a negligible value like 0.99.

flow_data <- flow_data %>%
  mutate_at(vars(ends_with("_count")), ~ ifelse(. == 0, 0.99, .))

write_rds(flow_data, "data/rds/flow_data.rds")
flow_data <- read_rds("data/rds/flow_data.rds")

Apply log transformation

Poisson regression is based on log, so log() has to be applied to all our explanatory variables before calibrating the various spatial interaction models.

flow_data_log <- flow_data %>%
  mutate_at(vars(ends_with("_count")), log) %>%
  mutate(dist = log(dist))

summary(flow_data_log)
   ORIGIN_hex        DESTIN_hex        DAY_TYPE         TIME_PER_HOUR  
 1126   :   9528   1126   :   9225   Length:1809270     Min.   : 0.00  
 1349   :   9207   695    :   9052   Class :character   1st Qu.:10.00  
 1258   :   9110   1349   :   8999   Mode  :character   Median :14.00  
 1107   :   8728   1258   :   8927                      Mean   :13.88  
 695    :   8700   1342   :   8811                      3rd Qu.:18.00  
 1342   :   8693   1107   :   8667                      Max.   :23.00  
 (Other):1755304   (Other):1755589                                     
  TOTAL_TRIPS        FlowNoIntra           offset       dist       
 Min.   :     1.0   Min.   :     1.0   Min.   :1   Min.   : 6.620  
 1st Qu.:     3.0   1st Qu.:     3.0   1st Qu.:1   1st Qu.: 7.903  
 Median :    13.0   Median :    13.0   Median :1   Median : 8.452  
 Mean   :   143.9   Mean   :   143.9   Mean   :1   Mean   : 8.380  
 3rd Qu.:    58.0   3rd Qu.:    58.0   3rd Qu.:1   3rd Qu.: 8.979  
 Max.   :134829.0   Max.   :134829.0   Max.   :1   Max.   :10.118  
                                                                   
  o_biz_count       o_school_count      o_fin_count         o_hc_count      
 Min.   :-0.01005   Min.   :-0.01005   Min.   :-0.01005   Min.   :-0.01005  
 1st Qu.:-0.01005   1st Qu.:-0.01005   1st Qu.: 0.00000   1st Qu.:-0.01005  
 Median : 0.00000   Median :-0.01005   Median : 1.09861   Median : 0.00000  
 Mean   : 0.78921   Mean   : 0.11034   Mean   : 1.19420   Mean   : 0.66403  
 3rd Qu.: 1.60944   3rd Qu.: 0.00000   3rd Qu.: 2.07944   3rd Qu.: 1.38629  
 Max.   : 4.59512   Max.   : 1.38629   Max.   : 3.78419   Max.   : 2.83321  
                                                                            
 o_busstop_count o_housing_count    o_leisure_recre_count o_retail_count    
 Min.   :0.000   Min.   :-0.01005   Min.   :-0.01005      Min.   :-0.01005  
 1st Qu.:1.792   1st Qu.:-0.01005   1st Qu.:-0.01005      1st Qu.: 1.94591  
 Median :2.079   Median : 7.18614   Median : 0.00000      Median : 3.25810  
 Mean   :1.939   Mean   : 5.15444   Mean   : 0.39771      Mean   : 3.08542  
 3rd Qu.:2.303   3rd Qu.: 8.23244   3rd Qu.: 0.69315      3rd Qu.: 4.31749  
 Max.   :2.996   Max.   : 8.97196   Max.   : 3.13549      Max.   : 6.47235  
                                                                            
 o_entertn_count    o_food_bev_count   d_o_biz_count      d_o_school_count  
 Min.   :-0.01005   Min.   :-0.01005   Min.   :-0.01005   Min.   :-0.01005  
 1st Qu.:-0.01005   1st Qu.:-0.01005   1st Qu.:-0.01005   1st Qu.:-0.01005  
 Median :-0.01005   Median :-0.01005   Median : 0.00000   Median :-0.01005  
 Mean   : 0.01453   Mean   : 0.32800   Mean   : 0.78558   Mean   : 0.10923  
 3rd Qu.:-0.01005   3rd Qu.: 0.00000   3rd Qu.: 1.60944   3rd Qu.: 0.00000  
 Max.   : 1.60944   Max.   : 4.70048   Max.   : 4.59512   Max.   : 1.38629  
                                                                            
 d_o_fin_count       d_o_hc_count      d_o_busstop_count d_o_housing_count 
 Min.   :-0.01005   Min.   :-0.01005   Min.   :0.000     Min.   :-0.01005  
 1st Qu.:-0.01005   1st Qu.:-0.01005   1st Qu.:1.792     1st Qu.:-0.01005  
 Median : 1.09861   Median : 0.00000   Median :2.079     Median : 7.18614  
 Mean   : 1.18753   Mean   : 0.66415   Mean   :1.937     Mean   : 5.17345  
 3rd Qu.: 2.07944   3rd Qu.: 1.38629   3rd Qu.:2.303     3rd Qu.: 8.24091  
 Max.   : 3.78419   Max.   : 2.83321   Max.   :2.996     Max.   : 8.97196  
                                                                           
 d_o_leisure_recre_count d_o_retail_count   d_o_entertn_count 
 Min.   :-0.01005        Min.   :-0.01005   Min.   :-0.01005  
 1st Qu.:-0.01005        1st Qu.: 1.94591   1st Qu.:-0.01005  
 Median : 0.00000        Median : 3.25810   Median :-0.01005  
 Mean   : 0.39480        Mean   : 3.08702   Mean   : 0.01390  
 3rd Qu.: 0.69315        3rd Qu.: 4.33073   3rd Qu.:-0.01005  
 Max.   : 3.13549        Max.   : 6.47235   Max.   : 1.60944  
                                                              
 d_o_food_bev_count
 Min.   :-0.01005  
 1st Qu.:-0.01005  
 Median :-0.01005  
 Mean   : 0.32654  
 3rd Qu.: 0.00000  
 Max.   : 4.70048  
                   

Spatial Interaction Modelling

SIM is a mathematical model predicting the movement of people between origins (like homes) and destinations by examining the distance between them.

In a healthcare context, a SIM can account the likely demand for health services and the quality of service provision at health centres. Conventionally, SIM’s cousin term is gravity model.

Variable Construction

# Generate propulsive variables names
origin_var <- propulsive %>%
  select(-(index)) %>%
  names()

# Generate attractiveness variables names
destin_var <- attractiveness %>%
  select(-(index)) %>%
  names()

Origin Constrained Model

# Generate the formula dynamically
formula_string <- paste("TOTAL_TRIPS ~ ORIGIN_hex +", 
                        paste(destin_var, collapse = " + "), "+ dist - 1")

# Convert the string to a formula
formula <- as.formula(formula_string)

orcSIM <- glm(formula,
              family = poisson(link = "log"),
              data = flow_data_log,
              na.action = na.exclude)

write_rds(orcSIM, "data/rds/decSIM.rds")
orcSIM <- read_rds("data/rds/orcSIM.rds")
summary(orcSIM)

Destination Constrained Model

# Generate the formula dynamically
formula_string <- paste("TOTAL_TRIPS ~ DESTIN_hex +", 
                        paste(destin_var, collapse = " + "), "+ dist - 1")

# Convert the string to a formula
formula <- as.formula(formula_string)

decSIM <- glm(formula,
              family = poisson(link = "log"),
              data = flow_data_log,
              na.action = na.exclude)

write_rds(decSIM, "data/rds/decSIM.rds")
decSIM <- read_rds("data/rds/decSIM.rds")
summary(decSIM)

Doubly Constrained Model

dbcSIM <- glm(formula = TOTAL_TRIPS ~ 
                ORIGIN_hex + 
                DESTIN_hex + 
                dist,
              family = poisson(link = "log"),
              data = flow_data_log,
              na.action = na.exclude)

write_rds(dbcSIM, "data/rds/dbcSIM.rds")
dbcSIM <- read_rds("data/rds/decSIM.rds")
summary(dbcSIM)

Prototype for Exploratory Spatial Data Analysis

In light of our class project, I aim to test the feasibility of plotting out the various commuting flow maps.

Specifically, the maps can focus on the various areas of interest and plot the flow lines originating and ending in AOIs of ranged densities.

The analytical unit could be planning area, subzone or even hexagonal.

Before diving into these analytical units, we can have a look at the general inflow and outflows from origins and destinations.

Level 1: General Perspective

The TOTAL_TRIPS is set to 3000 at minimum due to computational time taken to run the below code chunks.

General Outflows from Origins (Propulsive)

Here, I assume that the user is interested in bus traffic flows and where buses typically leave.

flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
tm_shape(org_hex_3000) + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%  
  filter(TOTAL_TRIPS >= 3000) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Origin: Desire Lines Traffic Analysis Zones \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

The potential user interface for the above scenario may look like this:

Table for Top Five Propulsive Hexagons (TBC)

Inflows to Destinations (Attractive)

This scenario assumes that the user is interested in the popular bus destinations, i.e. TOTAL_TRIPS is minmally 8000 on weekdays.

flowLine_dest <- filter(flowLine, DESTIN_hex %in% hex_grid_bounded3$index)
dest_hex <- filter(hex_grid_bounded3, index %in% flowLine_dest$DESTIN_hex)

dest_hex_8000 <- dest_hex[dest_hex$index %in% flowLine_dest$DESTIN_hex[flowLine_dest$TOTAL_TRIPS >= 10000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
tm_shape(dest_hex_8000) + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_dest %>%  
  filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 8000) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Destinations: Desire Lines with at least 8000 trips \nbetween Traffic Analysis Zones for Weekdays \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

The potential user interface for the above scenario may look like this:

Weekday vs Weekend/Holiday

This scenario simulates the user wanting to find out how do bus destinations differ on weekends compared to weekdays.

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
tm_shape(dest_hex_8000) + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_dest %>%  
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY", TOTAL_TRIPS >= 8000) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Destinations: Desire Lines with at least 8000 trips \nbetween Traffic Analysis Zones for Weekends \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

  • Hexagons linked to desire lines reveal that they are more popular starting or destination locations for bus rides.

  • There are many trips made to destinations in the north, which is around Tuas Checkpoint.

The potential user interface for the above scenario may look like this:

Specific Hour of the Day

A user might be curious about bus traffic flows and where they start on weekdays at 8am.

flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
tm_shape(org_hex_3000) + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%  
  filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 8) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Origin: Desire Lines Traffic Analysis Zones Weekdays, 8am \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

The potential user interface for the above scenario may look like this:

Since too many desire lines are plotted, the user can narrow the minimum number of total trips per origin-destination hexagon pair to 10000.

flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_10000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 10000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
tm_shape(org_hex_10000) + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%  
  filter(TOTAL_TRIPS >= 10000, TIME_PER_HOUR == 8) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Origin: Desire Lines between Traffic Analysis Zones with >10000 Trips \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

The potential user interface for the above scenario may look like this:

Level 2: Planning Area

The below scenario is whereby a user is interested in looking at bus trips starting from a Planning Area like Bedok at 8am on a weekday.

tmap_mode("view")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
org_hex_3000 %>%
  filter(PLN_AREA_N == "BEDOK") %>%
tm_shape() + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%
  filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$PLN_AREA_N == "BEDOK"]) %>%  
  filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 8) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Bedok Origin: Desire Lines between Traffic Analysis Zones with >3000 Trips \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
  • The majority of bus trips at 8 am starting from Bedok Planning Area mostly end within the Planning Area too.

The potential user interface for the above scenario may look like this:

Multiple starting planning areas for bus trips can also be observed.

tmap_mode("view")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
org_hex_3000 %>%
  filter(PLN_AREA_N %in% c("BEDOK", "PUNGGOL", "PAYA LEBAR")) %>%
tm_shape() + 
  tm_fill(col = "PLN_AREA_N", palette = "Dark2", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%
  filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$PLN_AREA_N %in% c("BEDOK", "PUNGGOL", "PAYA LEBAR")]) %>%  
  filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 8) %>%
left_join(org_hex_3000 %>% distinct(index, PLN_AREA_N), by = c("ORIGIN_hex" = "index")) %>%
  tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           col = "PLN_AREA_N",
           palette = "Dark2",
           scale = c(1, 2, 3, 4, 5, 7, 9),
           alpha = 0.5
  ) + 
  tm_layout(main.title = "Bedok, Paya Lebar & Punggol Origin: Desire Lines between \nTraffic Analysis Zones with >3000 Trips (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) +
tm_view(set.zoom.limits = c(12,30))
  • There are more origin locations at Bedok than Punggol at 8am.

  • There are noticeable outgoing bus trips from Punggol to Bedok Planning Area at 8am.

The potential user interface for the above scenario may look like this:

Level 3: Subzone

The user might also want to view bus travels at the subzone level, such as those starting from subzone Mathilda in Punggol planning area.

tmap_mode("plot")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
org_hex_3000 %>%
  filter(SUBZONE_N == "MATILDA") %>%
tm_shape() + 
  tm_fill(col = "skyblue", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%
  filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$SUBZONE_N == "MATILDA"]) %>%  
  filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 7) %>%
tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           scale = c(1,2,3,4,5,7,9),
           n = 6, 
           alpha = 0.5,
           title.lwd = "Total Trips"
           ) + 
  tm_layout(main.title = "Punggol Matilda Origin: Desire Lines between Traffic Analysis Zones with >3000 Trips \n(Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 

  • Between 7am and 8am, there are many trips from Matilda subzone to its immediate east hexagon neighbour that is also within the Punggol Planning Area.

Town Councils have recently been pushing for silver zones which promotes elderly-friendly paths along with school zones. Traffic calming in such zones may affect bus routes and it is important to check the volume of inter Planning Area bus routes. Planners can use the multiple-subzones perspective as a proxy of the potential economic costs of constructing such zones:

tmap_mode("view")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)

org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]

org_hex_3000 <- org_hex_3000[, c("index", setdiff(names(org_hex_3000), "index"))]

hex_grid_bounded3 <- hex_grid_bounded3[, c("index", setdiff(names(hex_grid_bounded3), "index"))]

tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
  tm_fill(col = "lightgrey", alpha = 0.2) +
org_hex_3000 %>%
  filter(SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")) %>%
tm_shape() + 
  tm_fill(col = "SUBZONE_N", palette = "Dark2", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%
  filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")]) %>%  
  filter(DAY_TYPE == "WEEKDAY",TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 7) %>%
left_join(org_hex_3000 %>% distinct(index, SUBZONE_N), by = c("ORIGIN_hex" = "index")) %>%
  tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           col = "SUBZONE_N",
           palette = "Dark2",
           scale = c(1, 2, 3, 4, 5, 7, 9),
           alpha = 0.5
  ) + 
  tm_layout(main.title = "Punggol Subzones Origin: Desire Lines between \nTraffic Analysis Zones with >3000 Trips (Nov '23 - Jan '24)",
            main.title.position = "center",
            main.title.size = 0.6,
            frame = TRUE) +
  tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) +
tm_view(set.zoom.limits = c(12,20))
Tip
  • On weekdays, 7-8am, there has been a high total bus traffic of 29,460 trips from Matilda (hex 1368) to Punggol Field (hex 1390), and 29,178 trips from Punggol Field (hex 1390) to Waterway East (hex 1412).

  • There is also high bus traffic of 14,581 trips within Punggol Field subzone, from hex 1390 to hex 1400. Should there be construction plans in these hexagonal areas, the relevant government agencies or even the town council can advice residents on alternative routes to go to their respective destinations.

The potential user interface for the above scenario may look like this:

Validating Spatial Interaction Models

As a teaser to SIMs, users can have a look at whether the volume of bus trips could be influenced by areas of interests within the vicinity.

From the previous map, we can see that there is a total high bus traffic of 14,581 from Punggol Field hex 1400 to Bedok North (hex 1515) from 7-8am on weekdays.

tmap_mode("view")
tm_shape(hex_grid_bounded3) +
  tm_borders(col = "grey") +
org_hex_3000 %>%
  filter(SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")) %>%
tm_shape() + 
  tm_fill(col = "SUBZONE_N", palette = "Dark2", alpha = 0.5) +
  tm_borders(col = "grey") +
flowLine_org %>%
  filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")]) %>%  
  filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 7) %>%
left_join(org_hex_3000 %>% distinct(index, SUBZONE_N), by = c("ORIGIN_hex" = "index")) %>%
  tm_shape() +
  tm_lines(lwd = "TOTAL_TRIPS",
           col = "SUBZONE_N",
           palette = "Dark2",
           scale = c(1, 2, 3, 4, 5, 7, 9),
           alpha = 0.5
  ) + 
tm_shape(biz) +
  tm_dots(col = "blue") +
tm_layout(main.title = "Punggol Subzones Origin: Desire Lines between \nTraffic Analysis Zones with >3000 Trips (Nov '23 - Jan '24)",
          main.title.position = "center",
          main.title.size = 0.6,
          frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) +
tm_view(set.zoom.limits = c(12,20))
tmap_mode("plot")
Note
  • Perhaps on weekdays 8-9am, Bedok North hex 1515 is a popular destination for the population living in Punggol Field hex 1400 due to the numerous businesses in the former hex which is also an industrial area.

Follow-up Actions

The space for the commute flow maps could be further utilised with a data table showing the top statistics based on the features the user has selected.